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A system manager dynamically controls a diffusion process Z 
that lives in a finite interval [0,6]. Control takes the form of a nega- 
tive drift rate 6 that is chosen from a fixed set A of available values. 
The controlled process evolves according to the differential relation- 
ship dZ = dX — 9(Z) dt + dL — dU, where X is a (0, <r) Brownian 
motion, and L and U are increasing processes that enforce a lower 
reflecting barrier at Z = and an upper reflecting barrier at Z = b, 
respectively. The cumulative cost process increases according to the 
differential relationship d£ = c(8(Z)) dt + pdU, where c(-) is a nonde- 
creasing cost of control and p > is a penalty rate associated with 
displacement at the upper boundary. The objective is to minimize 
long-run average cost. This problem is solved explicitly, which allows 
one to also solve the following, essentially equivalent formulation: 
minimize the long-run average cost of control subject to an upper 
bound constraint on the average rate at which U increases. The two 
special problem features that allow an explicit solution are the use 
of a long-run average cost criterion, as opposed to a discounted cost 
criterion, and the lack of state-related costs other than boundary dis- 
placement penalties. The application of this theory to power control 
in wireless communication is discussed. 

1. Introduction and summary. In this paper we formulate and solve a 
one-dimensional Brownian control problem that arises in queueing theory. 
To be more specific, it serves to approximate the dynamic control problem 
portrayed in Figure 1. Here jobs or customers arrive at an average rate of A, 
and they are served at an average rate of n that can be varied dynamically 
based on system status. The interarrival and service time distributions can 
be general, since we ultimately study a diffusion approximation where only 
the first two moments of the underlying distributions are relevant. In the 
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arrivals are rejected at fixed cost 
per rejection when buffer is full 



system manager chooses service 
rate \i with associated cost rate 



exogeneous arrivals 




finite buffer capacity 

FlG. 1. A processing system model. 



model formulation that we consider, new arrivals are rejected when a finite 
buffer capacity is exceeded, and congestion costs come in the form of penal- 
ties for such rejections. As we shall explain later, one can think of the finite 
buffer capacity as either a physical parameter or a policy parameter, and 
in the latter case it may be viewed as an upper bound on the throughput 
times experienced by accepted customers. In addition to the penalty cost 
per rejection, there is a cost rate that increases with /z. The system man- 
ager's problem is to choose /x as a function of the current queue length so 
as to minimize the long-run average cost incurred per time unit, referred to 
hereafter as simply the average cost. 

The approximating Brownian control problem that we study here is por- 
trayed in Figure 2. The state of the system at time t > is given by a variable 
Z(t) that one interprets as a scaled version of the queue length (or buffer 
content) in the original model. The controlled stochastic process Z has the 
following form: 

(1) Z(t) = Z(0)+X{t)- f 6{Z{s))ds + L(t)-U(t), t>0. 

Jo 



Z(t) 



penalty rate p per unit of boundary displacement 



choose a negative drift rate and incur cost c( 6) 



costless boundary displacement 



Fig. 2. Brownian control problem. 
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Here X = {X(t), t > 0} is a Brownian motion with drift parameter zero and 
variance parameter o 1 > 0, and #(•) is a state-dependent negative drift rate 
that represents the system manager's control policy. Also, Z(0) € [0, b] is the 
initial backlog of work to be processed (a fixed constant), and L and U are 
"pushing processes" associated with the lower boundary Z = and upper 
boundary Z = b, respectively. To be more specific, L(-) and U(-) increase in 
the minimal amounts sufficient to ensure Z(t) € [0, b] for all t > 0, which can 
be expressed mathematically as follows: 

(2) Z(t)€[0,6], t>0, 

(3) L(-), U(-) are nondecreasing and continuous with L(0) = U(0) = 0, 

(4) J*t { z( s )>o}dL(s)=j\ {z{s)<b} dU(s)=0, t>0. 

Using the terminology that is standard in diffusion theory, the nondecreasing 
processes L and £7 serve to enforce a lower "reflecting" barrier at Z = and 
an upper "reflecting" barrier at Z = b, respectively, given the chosen control 
policy (?(•). The system model embodied in (l)-(4) generalizes the finite- 
buffer model described and analyzed in Chapter 5 of [9], the generalization 
being to state-dependent drift. With the cost structure considered here, the 
cumulative cost incurred over the time interval [0, t] is 

(5) £(t)= f c(8(Z(s)))ds + P U(t), t>0, 



o 

and the system manager's objective is to 

(6) minimize 7= lim — E[£(i)]. 

t^oo t 

Later in the paper we shall describe the application of our theory to power 
control in wireless communication. 

Under very mild assumptions on the cost function c (see Section 2), we 
derive an explicit solution for the Brownian control problem described above: 
For arbitrary p > 0, an optimal control policy {6(z,p):z G [0,6]} is given 
by (28) in Section 3. 

One important antecedent of this paper is the work of George and Har- 
rison [8] on dynamic control of the service rate in a Markovian queueing 
model. Of course, their problem has a discrete rather than continuous state 
space, and the cost structure assumed in [8] differs in certain important ways 
from what we consider here, but as readers will see in Section 2 below, some 
aspects of the George-Harrison analysis carry over directly to our setting. 

Because the Brownian control problem considered here has reflecting bar- 
riers, and moreover has cost associated with "pushing" at one of those barri- 
ers, there is a certain amount of commonality with the theory of "singular" 
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stochastic control that was initiated by the work of Benes, Shepp and Wit- 
senhausen [4] ; see Chapter 6 of [9] for an elementary example of singular con- 
trol. However, the positioning of the reflecting barriers is not discretionary 
in our model, and so it is more properly associated with the classical theory 
of drift rate control for diffusions; see [11] or [7]. 

A distinguishing feature of the problem formulation studied here is its 
use of an average cost optimality criterion, as opposed to the discounted 
cost criterion that predominates in the stochastic control literature. One 
can express this by saying that we take the interest rate for discounting 
to be zero, or that we consider only the limiting case as the interest rate 
approaches zero. That restriction is motivated by tractability considerations: 
we are able to derive an explicit solution under an average cost criterion, 
but our formulas cannot be extended in any obvious way to the general 
discounted cost criterion. Stochastic control with an average cost criterion 
is also called "ergodic control" [11] and "stationary control" [3]. 

The remainder of the paper is structured as follows. Section 2 lays out our 
assumptions on the cost of control c(-) and then compiles various prelimi- 
nary results that are used in later analysis. Section 3 contains the precise 
mathematical statement and explicit solution of our Brownian control prob- 
lem where there is a penalty rate p > for rejections. Finally, Section 4 
describes the power control application mentioned earlier in this introduc- 
tion, where rejected customers correspond to dropped data packets in a 
wireless communication system. In that context an apparently different but 
essentially equivalent problem formulation is natural. To explain the alter- 
native formulation, we need additional notation: under any policy worthy of 
consideration there exists a constant (3 > such that 

(7) -E[U(t)]^p ast^oo. 

In the wireless communication context, (3 represents (a scaled version) of 
the packet drop rate, and it is natural to impose a performance constraint 
[3 < (3, where (3 > is a given constant, rather than specifying a cost per 
dropped packet. In Section 4 we shall explain how this formulation can be 
reduced to our original one by "dualizing" the performance constraint. 

As often happens in the analysis of specific stochastic control problems, we 
find that existing foundational theory is not quite suitable for our purposes. 
For example, we cannot point to a standard reference work that states and 
rigorously justifies a Bellman equation (providing an analytical characteri- 
zation of optimal controls) for a class of problems general enough to include 
our model. Thus at several points we develop minor variants of standard 
textbook theory and then justify those variants from first principles. The 
style of argument that we use is completely standard, however, so no con- 
tribution to general theory can be claimed. Rather, the contribution of this 
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c(x) 




x 



Fig. 3. An illustrative cost function. 



paper is to solve explicitly a well-motivated problem of optimal stochastic 
control, which is only possible because of the problem's special structure. 

2. Cost of control and related quantities. This section first specifies our 
assumptions regarding the cost of control c(-), and then develops two propo- 
sitions that are needed for purposes of stating and proving the paper's main 
results (see Sections 3 and 4). The domain of the function c (i.e., the set 
of possible negative drift rates 9) can be any closed subset A of M that 
has a smallest element and c is assumed to be nondecreasing and left- 
continuous on A with c(0*) =0. The last requirement is just a convenient 
normalization; if one starts with a model where c(0#) = and then adds 
a constant to c(x) for all x € A, the optimal control policy is not changed 
but the associated average cost is increased by that constant. To eliminate 
uninteresting complications, we assume that c(x) > for all x > 0*. If A is 
unbounded, we further require that 

(8) inf | - - : x 6 A, x > ?/| j oo as y | oo. 

Figure 3 shows an illustrative cost function whose domain is A = [9\, 9%] U 
[#3,6*4]. Let us denote by c(-) the greatest convex function on the extended 
domain A = [9*, 00) such that c(-) > c(-) on A, calling this the effective 
cost rate function for reasons explained below. For the example portrayed 
in Figure 3, the effective cost rate function is given by the dashed line on 
6*4] and then c{x) = 00 for x > 6*4. It will be seen that the optimal solution 
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of our Brownian control problem remains the same if A is replaced by A and 
c by c. Exactly as in Section 4 of [8], let 

(9) 4>{y) = sup{yx — c(x)} for y > 0. 

As observed on page 724 of [8], it is straightforward to prove the following: 
first, the supremum in (9) is finite for all y > 0, and second, there exists a 
smallest i*6i that achieves the supremum. Hereafter, that smallest max- 
imizer will be denoted by ip{y), as in [8]. That is, 

(10) V'(y) = inf argmaxjyx — c(x)} for y > 0. 

xeA 

The assumption stated in (8) is used in an essential way to prove the as- 
sertions made immediately above when A is unbounded, see [8]. Various 
other properties of and are proved in Section 5 of [8], including the 
following. (Here the integral is defined in the ordinary Riemann sense.) 

Proposition 1. ip(-) is nondecreasing and left- continuous on [0,oo) and 

(11) <f>(y)= [\(u)du fory>0. 

Jo 

Also, as observed in Section 5 of [8], it is easy to see that (/>(•) is a convex 
function on [0, oo). The following properties of tjj(-) and </>(■) will be needed 
in what follows. Detailed proofs (tedious but straightforward exercises in 
real analysis) are provided in [2]. 

Proposition 2. We have the following: 

(i) ip(-) is right- continuous at zero; 

(ii) 0* = — inf{</>(y) :y > 0} < oo if and only if A has a nonnegative ele- 
ment; 

(hi) if A is unbounded, then ip(y) — > oo and 4>(y) — > oo as y — > oo; 
(iv) if A is bounded, then ip(y) ^9* as y — > oo, where 8* = sup A. 

In the analysis that follows, readers will see that the function ip efficiently 
captures all aspects of the cost rate function c that are relevant for our 
purposes. As an aid to intuition, it is useful to consider the special case 
where A = [6*,oo) and c is strictly convex, nondecreasing and continuously 
differentiable on A, defining y* = d ' {0*) to ease notation. In this case ip{y) = 
9* if < y < y*, and ijj(-) is the inverse of c'(-) on [y*,oo). 

In general, readers can easily verify that ip remains the same if we replace 
c(-) by its convex hull c(-). Also, denoting by A* the set of all x S A such 
that c{x) = c(x), it is shown in Section 5 of [8] that ip(y) € A* for all y > 0. 
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3. Problem formulation and its solution. We now consider the first Brow- 
nian control problem described in Section 1, where there is a penalty rate 
p > associated with "pushing" at the upper barrier Z = b. 

3.1. Admissible control policies. To minimize technical complexity, we 
shall restrict attention to stationary, Markov control policies #(•) as in Sec- 
tion 1. That is, the negative drift rate chosen at any time t is assumed 
to depend on past history only through the observed value Z(t). (Presum- 
ably our analysis could be extended to allow more general, nonstationary 
and history-dependent policies, but that avenue will not be explored here.) 
An admissible control policy is defined as a bounded measurable function 
6 : [0, b] -» A. 

We must associate with each such policy a set of processes (X, Z, L, U) 
that satisfy the relationships (1)— (4). To be more precise, we need to asso- 
ciate with each admissible policy #(•) a solution of the following Skorohod 
problem. (This problem may also be described as one of solving a stochas- 
tic differential equation subject to reflecting boundary conditions.) First, 
X is a Brownian motion with zero drift and variance parameter a 2 > and 
X(0) = almost surely on some filtered probability space (fi, P; J-t, t > 0). 
Second, X is a martingale with respect to the given filtration. Finally, the 
processes Z, L and U are defined on the same probability space as X, are 
adapted to the filtration and together with X satisfy (l)-(4). Hereafter we 
shall summarize this state of affairs by saying that (X, Z, L, U) is a solution 
of the Skorohod problem associated with #(■). 

Because of our restriction to bounded control policies, standard theory 
guarantees that the Skorohod problem for any admissible #(•) has a solution, 
and that the joint distribution of (X, Z, L,U) is unique: the case where #(•) 
is constant is treated, for example, in Chapter 5 of [9], and using that theory 
as a foundation one can prove both existence and uniqueness in distribution 
for the general case using Girsanov's theorem on change of measure for 
Brownian motion, see pages 302-306 of [10]. 

Throughout the remainder of this section let #(■) be a fixed admissible 
policy, and let (X, Z,L,U) be a solution of the associated Skorohod prob- 
lem. Also, let C 2 [0, b] be the space of functions f:[0,b] — ► K that are twice 
continuously differentiable up to the boundary (i.e., / is twice continuously 
differentiable on the interior of the interval, and its first and second deriva- 
tives both approach finite limits at the end points), and define the differential 
operator T on C 2 [0, b] via 

(12) Tf(z) = \a 2 f"{z) - 6(z)f(z) for z e [0, b}. 

Because {f'(Z(t)), t > 0} is a bounded process, a routine application of Ito's 
formula gives the following identity for any / £ C 2 [0, b] and t > 0: 



E[f(Z(t))]-f(Z(0)) 



<s 
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(13) t 
= l| J Tf(Z(s)) ds + /'(O)L(t) - f'(b)U(t) 

see Chapter 5 of [9]. Now let 7 be a constant such that 

(14) j<Tf(z) + c(0(z)) for all z G (0,6), 
and suppose that / further satisfies the boundary conditions 

(15) /'(0) = and /'(&)= p. 

Defining the cumulative cost process £ as in (5), we combine (13)-(15) to 
obtain the following: 

(16) E[/(Z(t))]-/(Z(0))> 7 t-E[£(i)] for all* > 0. 

Dividing both sides of (16) by t and letting t — > 00 gives Proposition 3 below, 
and Proposition 4 is proved using the obvious modification of this argument. 

Proposition 3. /// and 7 satisfy (14) and (15), then 

(17) liminf-E[£(i)l > 7. 

t^rCO t 

Proposition 4. // (14) holds with equality for all z G (0,b) and (15) 
also holds, then 

(18) lim-W)]= 7 . 

t—>co t 

3.2. The Bellman equation. Together, Propositions 3 and 4 motivate the 
following Bellman equation as a means of characterizing an optimal policy 
analytically: find a function / G C 2 [0, b] and a constant 7 that jointly satisfy 

(19) j = mm{±o- 2 f"{z) -xf'(z) + c(x)} for all z G (0, b), 

along with the boundary conditions (15). This is a nonlinear ordinary dif- 
ferential equation. Bellman equations of similar form have been derived for 
similar problems of ergodic control in many previous works; see page 65 
of [13] . We shall develop an explicit solution (/, 7) for this differential equa- 
tion, then define our candidate policy as the one that chooses in each state 
z a negative drift rate 9(z) equal to the smallest minimizer x in (19), and 
then use Propositions 3 and 4 to verify that the candidate policy is optimal. 
The calculations in the following paragraph are purely formal; the rigorous 
verification of our solution will be provided in Section 3.4. 

Of course, (19) is really a first-order equation, because it does not involve 
the unknown function / itself. Setting v(z) = f'(z) for z G [0, b] and recalling 
the definition (9) of (j), we can rewrite (19) as 

(20) 7 = \cr 2 v'(z) - <t>{v(z)) for z G (0, b). 
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as 

(21) V \( V ( { ^+_ =1 for (0,6). 

2 <p(v{y))+j 

Now we integrate both sides of (21) with respect to Lebesgue measure over 
the interval (0, z), then make the change of variable u = v{y) and use the 
boundary condition v(0) = f'(0) = from (15) to arrive at the following: 

1 r v ( z ) tin 

(22) \<?\ ~^— = z for, e (0,6). 

2 Jo 0(n)+7 

Our problem now is to choose the constant 7 so that the function v(-) de- 
fined by (22) satisfies the second boundary condition v(b) = f'(b) = p in (15). 
This task is undertaken in the next section, where we emphasize the para- 
metric dependence of our solution on the penalty rate p in order to facilitate 
future analysis. 

3.3. Solving the Bellman equation. For eachp > 0, let (j>*{p) = — mi{<p(y) : y € 
[0,p]}, which is finite and is achieved because cft(-) is continuous over [0,p]. 
Also, we define a function F(-,p) : (</>*(p), 00) — > R for each p > via 

fP tin 

(23) F(j,p)= 

Jo 4>(u) + 7 

The proof of the following proposition is straightforward but lengthy, with 
several separate cases requiring consideration; the details are spelled out in 
Appendix B.2 of [2]. 

Proposition 5. For each fixed p> 0, the function F (•, p) is continuous 
and strictly decreasing on (4>*(p), oo) with 

(24) lim F(7,p) = oo and lim F0y,p) = 0. 

The following result is an immediate consequence of Proposition 5. The 
inverse relationship that defines 7(-) is shown graphically in Figure 4. 

Corollary 1. For each p > there exists a unique j(p) G (4>*{p), oo) 
such that 

(25) ^a 2 F( 1 (p),p) = b. 

For each p > we now define a function G(-,p) : [0,p] [0, 6] via 

G(v,p) = \a 2 [ d ^ far«e[0,p]. 
2 Jo + j(p) 
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Fig. 4. The function F and optimal average cost 7(p). 

Clearly, G(-,p) is strictly increasing and continuously differentiable on (0,p). 
Therefore, its inverse is well denned. The following proposition is then im- 
mediate. 

Proposition 6. For each p>0 let v(-,p): [0,6] — ► [0,p] be the inverse 
of G(-,p). Then v(-,p) is strictly increasing and continuously differentiable 
on (0,b). 

For each p > define a function f(-,p) via 

(26) f(z,p)= f v(y,p)dy far ate [0,6]. 

J o 

The following proposition characterizes a solution of the Bellman equation 
explicitly. 

Proposition 7. For each p > the function f(-,p) is nonnegative, 
nondecreasing, strictly convex and belongs to C 2 [0, b] . Moreover, the pair 
(/(■)>7) = {fi'iP)-:lip)) satisfies the Bellman equation (19) with boundary 
conditions (15). 

Proof. It is immediate from (26) and Proposition 6 that f(-,p) is 
nonnegative, nondecreasing, strictly convex and twice continuously differen- 
tiable on (0,6). To show that (/(•), 7) = (f(-,p),7(p)) satisfies the Bellman 
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equation (19) with boundary conditions (15) we can equivalently show that 
v(-,p) and 7(p) satisfy (20) and the boundary conditions 

(27) u(0) = and v(b)=p. 

First, observe that 

v(0,p) = G~ 1 (0, p) = because G(0,p) = 0, 
v {b,p) = G~ l (b,p) =p because G(p,p) = 6. 

Therefore, (27) holds. Also, for z £ (0, b) 

2 



v '^p) = ±[G-\z,p)} 



y=v(z,p) °~ 2 



{v{z,p)) +7(p)], 



dz ' (d/dy)G(y,p) 

so that 

7(P) = \^v'{z,p) - 4>(v(z,p)). □ 

3.4. Optimality of the candidate policy. Our candidate policy is the one 
that chooses, in each state z € [0,6], the following negative drift rate: 

(28) 9(z,p) = iP(v(z,p)). 

From the monotonicity and left-continuity of ip(-), and the monotonicity and 
continuity of v(-,p), it follows that 0(-,p) is left-continuous and nondecreas- 
ing. It is easy to see that 9(-,p) is measurable and bounded, and hence is 
admissible. Now let #(•) be an arbitrary admissible policy, and let T be its 
associated differential operator defined by (12). From Proposition 7 and the 
form of the Bellman equation (19) one sees that 

(29) j(p)<Tf(z,p) + c(e(z)) for all z e (0,6). 

To facilitate comparison, let T* be the differential operator associated with 
our candidate policy and let £* = {£*(£),£ > 0} be its associated cumulative 
cost process as in (5). Now Proposition 7, the Bellman equation (19) and 
the definition (28) give us 

(30) 1 (p)=T*f(z,p)-c(9(z,p)) for all ze (0,6). 
The following is then immediate from Propositions 3 and 4. 

Theorem 1. The candidate policy is optimal in the following sense: if 
#(•) is any other admissible policy and £ = {£(i),i > 0} is its cumulative cost 
process, then 

(31) 7 (p) = hm jE[Z*(t)] < hminf iE[e(t)]. 

t— too t t^oo t 

Because we have restricted attention thus far to stationary Markov poli- 
cies #(•) that are moreover bounded, it is easy to show that the liminf in (31) 
is actually achieved as a limit for any admissible policy. 
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3.5. Average rejection rate /3{p). Fixing attention on the optimal policy 
6(-,p) denned by (28), let (X*, Z* , L* , U*) be a solution of the associated 
Skorohod problem, and let T* be the associated differential operator, defined 
by (12) with 9(-,p) in place of #(•). We shall now compute lim^oo jE[U*(t)] 
under this optimal policy. To this end, for each p > we first consider the 
following differential equation, whose unknowns are a constant (3{p) and a 
continuously differentiable function u(-,p) defined on [0,6]: 

(32) \<r 2 u'{z,p) - 6{z,p)u{z,p) - f3(p) = 0, 

(33) u(0,p) = and u{b,p) = l. 

The proof of the following proposition is straightforward, and hence is omit- 
ted. 



Proposition 8. For eachp > the solution of (32)-(33) is given below: 

(34) m = I* exp { -r>(^)/^ } 

2 $e W {-fy(2e(z,p)/a2)dz}dy 

The following proposition characterizes the average rejection rate under 
the optimal policy 6(-,p) defined by (28). 

Proposition 9. The constant (3{p) defined by (34) is the average rejec- 
tion rate under the optimal policy 9(-,p). That is, 

(36) lim ]-M[U*(t)]=P(p). 
Moreover, 

(37) l(p)>pP(p). 
Proof. Fix p > and define a function g(-) via 

(38) g(z)= f u(y,p)dy for z€ [0,6]. 

Jo 

From (13) we have that 

E[g(Z*(t))]-g(Z*(0)) 

(39) f t 
= E|y r*g(Z*(s))ds + g'(0)L*(t)-g'(b)U*(t) 
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Now (32) is equivalently expressed as T*g(z) = Pip), z € [0,6], whereas (33) 
says that </(0) =0 and g'(b) = p. Substituting these relationships in (39) 
gives 



Dividing both sides of (40) by t and letting t — » oo establishes (36). Combin- 
ing this with the probabilistic interpretation of 'yip) provided in Theorem 1, 
we have that 



and then (37) follows from the assumed nonnegativity of c(-). □ 

4. Application to power control in wireless communication. In this sec- 
tion we describe a problem of dynamic power control in wireless communi- 
cation, which can be studied using the machinery developed in this paper. 
The system manager dynamically chooses a state-dependent transmission 
rate on a static, point-to-point wireless link by varying transmission power 
over time. To the best of our knowledge, the first study that explores power 
and delay trade-offs using dynamic programming techniques is the Ph.D. dis- 
sertatation of Berry [5] (also see [6]). He uses a discrete-time Markov chain 
model to study a dynamic power control problem and develops structural 
results regarding the optimal control policy. 

We model the wireless link as a simple queueing system: packets requiring 
transmission arrive in a stationary process at some average rate A > 0; they 
are stored in a buffer having a finite capacity b (see below); and they are 
transmitted on a first-in-first-out basis at a rate which depends on the power 
level chosen. We denote by Z(t) the number of packets stored in the buffer 
at time t, calling this the "buffer content." Alternatively, one may adopt a 
larger unit of measurement, such as hundreds of packets, in describing buffer 
content, buffer capacity and data flow rates. That kind of scaling is quite 
natural in the wireless communication context, and it accords well with the 
standard line of argument used to justify or motivate diffusion models in 
the literature of applied probability However, the choice of unit is irrelevant 
for purposes of actual model application, and it is linguistically simpler to 
speak in terms of unsealed quantities, so we shall continue to do so in the 
following discussion. 

We use the term "nominal power level" to mean the power level that 
produces an average transmission rate (or average output rate) precisely 
equal to the average input rate A. If the system manager were to keep the 
power level at its nominal value regardless of circumstances, then the buffer 
content process Z = {Zit), t > 0} could be reasonably modeled as a one- 
dimensional reflected Brownian motion with zero drift and bounded state 



(40) 



E[giZ*it))]-giZ*i0))=(5ip)t-E[U*it)]. 



(41) 
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space [0,6]; see Chapter 2 of [9]. That is precisely the system model (1)— 
(4) considered in this paper, except that the drift #(•) is identically zero in 
the artificial control scenario considered thus far; in the current context one 
interprets U(t) as the cumulative number of packets dropped up to time t 
due to finite buffer capacity, and L{t) as the cumulative number of potential 
packet transmissions "lost" up to time t due to emptiness of the buffer. 

Uysal-Biyikoglu, Prabhakar and El Gamal [15] have emphasized the trade- 
off between energy consumption and transmission speed in wireless commu- 
nication: lower transmission power leads to lower energy consumption but 
also to slower transmission and hence longer delays. Because information is 
delay sensitive, the system manager would like to impose an upper bound 
constraint on the delays experienced by packets that pass through the sys- 
tem. Such a formulation is not meaningful in a conventional model, because 
packet delays are random variables, but in the "heavy traffic" parameter 
regime where Brownian models play a prominent role, Plambeck, Kumar 
and Harrison [14] have argued that an upper bound constraint on buffer 
contents is very nearly equivalent to an upper bound constraint on packet 
delays. To be specific, requiring that packet delays be < d in the wireless 
communication setting is roughly equivalent to requiring Z(t) < \d. That 
is, by dropping packets whenever the buffer content reaches b = Xd, the sys- 
tem manager can enforce an upper bound of approximately d on the delays 
experienced by accepted packets. Thus the "buffer capacity" b in our model 
is not a physical parameter, but rather a policy parameter derived from a 
performance constraint. 

Continuing to develop our Brownian formulation of the power control 
problem, we hypothesize a system manager who observes the buffer content 
Z and dynamically adjusts transmission power. An increase in power from 
the nominal level produces a negative drift 9 in the main system equation (1), 
and in symmetric fashion, a decrease from the nominal level produces a 
negative value of 9, hence positive drift. The energy consumption associated 
with a negative drift rate x is denoted c(x). Therefore, given a control policy 
#(•), energy consumption up to time t is /q c(9(Z(s))) ds. In [15] it was 
argued, based on information-theoretic principles, that the physically correct 
choice of the cost function c(-) has the form c(x) = exp{a(x — 9*)} — 1 for 
x>9 if , where a > is a constant. 

It remains to specify the system manager's objective, and a natural for- 
mulation is the following: choose a control policy #(•) to minimize long-run 
average energy consumption subject to an upper bound of (3 on the long-run 
average packet drop rate. Mathematically, this is expressed as follows: 

(42) minimize limsupEl - f c(0(Z(s)))ds\, 

t^oo [t Jo J 
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Fig. 5. The rejection rate under an optimal policy. 



subject to (l)-(4) plus the performance constraint 

(43) limsup-E[Z7(i)] <fi. 

t^oo t 

We have arrived at what might be called a constrained Brownian control 
formulation of the system manager's problem. Altman [1] develops a general 
approach to solving constrained Markov decision problems in a discrete-time 
framework. Relaxing the constraints that appear in the original problem 
formulation, he derives an equivalent "Lagrangian" problem. Proceeding 
in that way, one may relax the constraint (43) in our Brownian control 
problem and incorporate congestion concerns through a cost component 
in the objective. This gives rise to the problem formulation introduced in 
Section 1, where one can interpret the penalty rate p as the "Lagrange 
multiplier" associated with the performance constraint (43). 

In order to carry out that program, we study the parametric dependence 
of the solution developed in Section 3 on the penalty rate p. First, define 
the following constants: 

(44) Po = sup{y> O:V>(2/)=0*} ; 

(45) 8* = — 

and 

( a* 

if A has a maximal element 9*, 



(46) 3* = \ exp{29*b/a 2 } - 1 ' 

0, if A is unbounded. 

Recall that (34) gives an explicit formula for the average packet drop rate 
3(p) under an optimal policy. It is intuitively clear that /?(•) is continuous 
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on (0,oo); it is constant over (0,po] and strictly decreasing on [po,oo) with 
lim p |o/3(p) = 0* and Imxp^^^ip) = (3*. These assertions as well as some 
other related results are proved in Section 3.3.6 of [2]. Figure 5 shows an 



(3(p*) = (3, it is a straightforward matter to verify that the candidate policy 
given in (28) associated with the penalty rate p* is an optimal solution for 
our Brownian control problem with performance constraint (43); details of 
this verification are spelled out in Section 3.4 of [2]. 

We have made no attempt to justify our Brownian formulation of the 
power control problem as the "heavy traffic limit" of a conventional queueing- 
theoretic formulation. It seems likely that the limit theory developed in [12] 
can be adapted for that purpose. In particular, Section 9.3 deals with er- 
godic control problems like ours, but interpreting and verifying the various 
assumptions employed in that development is not a simple matter. Also, our 
"constrained" formulation of the original power control problem lies outside 
the framework used in [12], and accommodating that element would create 
another level of complexity in developing a rigorous limit theory. 
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